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Abstract 

We consider stochastic matrix models for population driven by ran- 
dom environments which form a Markov chain. The top Lyapunov ex- 
ponent a, which describes the long-term growth rate, depends smoothly 
on the demographic parameters (represented as matrix entries) and on 
the parameters that define the stochastic matrix of the driving Markov 
chain. The derivatives of a — the "stochastic elasticities" — with re- 
spect to changes in the demographic parameters were derived by Tul- 
japurkar (1990). These results are here extended to a formula for the 
derivatives with respect to changes in the Markov chain driving the 
environments. We supplement these formulas with rigorous bounds on 
computational estimation errors, and with rigorous derivations of both 
the new and the old formulas. 

1 Introduction 

Stochastic matrix models for structured populations are widely used in 
evolutionary biology, demographic forecasting, ecology, and population 
viability analysis (e.g., Tuljapurkar (1990); Lee and Tuljapurkar (1994); 
Morris and Doak (2002); Caswell (2001); Lande et al. (2003)). In 
these models, a discrete-time stochastic process drives changes in envi- 
ronmental conditions that determine the population's stage-transition 
rates (survival, fertility, growth, regression and so on). Population 
dynamics are described by a product of randomly chosen population 
projection matrices. In most biological situations the population's 
stage structure converges to a time-varying but stable structure Co- 
hen (1977), and in the long run the population grows at a stochastic 
growth rate a that is not random and is the leading Lyapunov exponent 
of the random product of population projection matrices Furstenberg 
and Kcstcn (1960); Cohen (1977); Lange (1979); Lange and Holmes 
(1981); Tuljapurkar and Orzack (1980). This growth rate a is of con- 
siderable biological interest, as a fitness measure for a stage-structured 
phenotype Tuljapurkar (1982), as a determinant of population viabil- 
ity and persistence Tuljapurkar and Orzack (1980); Morris and Doak 
(2002); Lande et al. (2003), and in a variety of invasion problems in 
evolution and epidemiology Metz et al. (1992). 

The map between environments and projection matrices describes 
how phenotypes change with environments, i.e., the phenotypic norm 
of response, and we are often interested in how populations respond 
to changes in, say, the mean or variance of the projection matrix ele- 
ments. Such questions are answered by computing the derivatives of 
a with respect to changes in the projection matrices, using a formula 
derived by Tuljapurkar (1990). Tuljapurkar et al. (2003) called these 
derivatives stochastic elasticities, to contrast with the elasticity of the 
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dominant eigenvalue of a fixed projection matrix to the elements of 
that matrix (Caswell, 200f). Stochastic elasticity has been used to 
examine evolutionary questions (Haridas and Tuljapurkar, 2005) and 
the effects of climate change (Morris et al., 2008). At the same time, 
a is also a function of the stochastic process that drives environments. 
Many processes, such as climate change (Boyce et al., 2006), will re- 
sult in changes in the frequencies of, or the probabilities of transition 
between, environmental states. William Morris (personal communica- 
tion 2005) posed the question: how is a affected by a change in the 
pattern and distribution of environments, rather than by a change in 
the population projection matrices? To answer his question, we con- 
sider a model in which the environment makes transitions among one 
of several discrete states, according to a Markov chain. Then what 
we want is the derivative of a with respect to changes in the transition 
probabilities of this Markov chain. This derivative exists (at least away 
from the boundaries of the space of stochastic matrices), and in fact 
we know from Peres (1992) that a is an analytic function of the pa- 
rameters of both the projection matrices and the parameters defining 
the stochastic matrix, in an open neighborhood of the set of stochastic 
matrices. In deterministic models (Caswell, 2001), the growth rate is 
represented as A = e r ; then sensitivities are derivatives of the form 
(dX/dx) with respect to a parameter x whereas elasticities are pro- 
portional derivatives of the form (dr/dlogx). In stochastic models we 
compute derivatives of a, and these can be used to compute elasticities 
(as in Tuljapurkar et al. (2003)) or sensitivities. 

Our first contribution here is a new formula for computing the 
derivative of a with respect to changes in the transition probabilities 
of the environmental Markov chain. To obtain this result we show how 
an initial environmental state affects future growth, using coupling and 
importance sampling; this analysis may be of independent interest. 
Even with a formula in hand we must compute derivatives of a by 
numerical simulation which is subject to both sampling (Monte Carlo) 
error and bias. Our second contribution here is to show how one can 
bound these estimation errors. Our third contribution is a rigorous 
proof of the heuristically derived formula given by Tuljapurkar (1990) 
for the derivatives of a to the elements of the population projection 
matrices. 

In Section 2 of this paper we set out the model and assumptions, 
the approach to finding derivatives, along with necessary facts about 
the convergence of population structures and distributions. In Sec- 
tion 3 we discuss systematic and sampling errors and show how we 
can bound them. We illustrate this approach in Section 4 by present- 
ing bounds (in Theorem 1) for simulation estimates of the stochastic 
growth rate a and (in Theorem 2) for the derivatives of a with respect 
to projection matrix elements. In Section 5 we define a measure of 
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the effect of an initial environmental state on subsequent population 
growth and show how to estimate this measure using coupling argu- 
ments. Section 6 presents (in Theorem 4) the formula, algorithm, and 
error bounds for the derivative of a with respect to the elements of 
the Markov chain that drives environments. We end by discussing 
how these theorems can be applied and some related issues concerning 
parameter estimation in such models. Proofs are in the Appendix. 

2 Model, Convergence and Stationary Dis- 
tributions 

We consider a population whose individuals exist in K different stages 
(these may be, for example, ages, developmental stages or size classes). 
Newborns are in stage i = 1. The progression between stages occurs at 
discrete time intervals at rates that depend on the environment in each 
time interval. The environment et in period t is in one of M possible 
states; we denote the set of possible environments by A4 = {1, . . . , M}. 
Individuals in stage i at time t move to stage j at a rate X et+1 (j,i). 
These rates are elements of a nonncgative population projection ma- 
trix, and at time t when the environment is et this matrix is denoted 
by X et ; there are M such matrices, one for each environmental state. 
We assume that allocation of individuals to classes and the identifica- 
tion of environment states are certain. We also assume that the total 
number of individuals in the population is large enough that we can 
ignore sampling variation. Successive environments are chosen accord- 
ing to a Markov process with transition matrix P whose elements arc 
P(e,e') and whose stationary distribution is v = -{V(e))}. We follow 
the standard convention for Markov chains, that P(e, e') represents 
the probability of a transition from state e to state e'; note that this is 
the opposite of the convention used in matrix population models. In 
some places we specialize to the case when the environments are i.i.d. 
(independent and identically distributed), with distribution v. 

To guarantee demographic weak ergodicity (Cohen 1977) we assume 
that 

(i) Each row of each population projection matrix has at least one 
positive entry. 

(ii) There exists some R > such that any product X ei ■ ■ ■ X ea has 
all entries positive. 

(iii) In the case of i.i.d. distributions, all v e are positive. In the Markov 
case, the chain is assumed to be ergodic (so transitive and aperi- 
odic), and environments are in the stationary distribution of P, 
which will also be denoted by v(e). 
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The population in year t is represented by a vector N t e with 
AT t = {(n t (l), . . . ,n t (k)) T : n t (i) > 0}. The superscript T will al- 
ways mean transpose; here it indicates that population vectors are 
column vectors. (There may be population classes early on that have 
members; condition (ii) above forces all classes eventually to have 
positive membership, and so we assume without loss of generality that 
we start with all population classes occupied.) The population struc- 
ture changes according to N t +i — X et+1 N t , and 

N t =X et X et _ 1 ---X ei N . (1) 

The normalized population structure Nt '■= ^t/QZi Nt{i)) does not 
converge to a fixed limit (as it would if the environment were constant) 
but it does converge in distribution. The long-run growth rate is not 
random and for each stage i, 

a := lim r 1 logN t (i) = lim r 1 log p^TT^ 

exists, and is the same in every realization. This a is called the 
"stochastic growth rate" . 

Counting the K 2 parameters in each matrix, there are at most 
(MK 2 + M 2 ) parameters. While all parameters must be nonnegative, 
and all elements of the population projection matrices but the birth 
rates must be < 1, the only universal constraint is ^2 eeM P(e', e) = 1. 
(There may, however, be further constraints imposed, as some transi- 
tions may be impossible. If we are considering age-structured popula- 
tions, the matrices are Leslie matrices, each with only 2K — 1 poten- 
tially nonzero parameters.) The sensitivities we examine are deriva- 
tives of a with respect to these parameters. When evaluating sensitiv- 
ities we work in an explicit basis in which perturbations are described 
by an appropriate matrix and we refer to change "in the direction of" 
that matrix. This will be made precise in the analyses that follow. 

2.1 Convergence 

We denote the i-T-dimensional column vector with l's in all places by 
1. By default we use the L 1 norm ||x|| :— \ x i\ when x is a vector in 
R^- , and write ||X|| := ||X1|| = X^fj=i when X is a K x K matrix. 
Our assumptions imply that there are positive constants k and f, such 
that for any environments e\, . . . , e m , 

\log\\X em ...X ei ||| <k + mf (2) 

We use the Hilbert projective metric p, described in Golubitsky 
et al. (1975) and defined by 

p(x,y) := log max x(i)/y(i) +log max y(i)/x(i). (3) 

Ki<K Ki<K 
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This is a pseudometric onl^ that is a metric on § := {(x(l), . . . , x{k)) T 
x(i) > and x {i) = !}■ The distance between two vectors is defined 
by the ray from the origin; that is, p(x,y) = p(x/\\x\\,y/\\y\\). It has 
been shown by Bushell (1973) that 

i min{x(i)} + mm{y(i)} e - p ^ p(x, y) < \\x - y\\ < e p( - x ^ - 1 

for any x,y € S := {(x(l), . . . , x{k)) T : x(i) > and ^x{i) — 1}. 
(The bound is actually shown with respect to the Euclidean norm, but 
the same argument holds for any L p norm.) Thus, convergence in the 
projective metric implies convergence of the projections onto § in the 
standard norms. 

Following Lemma 1 of Lange (1979), we define a compact con- 
vex subset hi C S which is stable under the transformations u — > 
X e w/||X e u|| for any e e M. and includes the vector 1/K as well as 
all vectors of the form X eR ■ ■ ■ X ei y/\\X eR ■ ■ ■ X ei y\\ where y £ §; and 
a compact convex subset V C S T which is stable under the transforma- 
tions v T — > v T X e /\\v T X e \\ and includes the vector 1 T /K as well as all 
vectors of the form y T X ea ■ ■ ■ X ei /\\y T X eR ■ ■ ■ X ei \\, where y T € § T . A 
useful fact about this metric follows: if u,u' e S, and v T any positive 
row vector, then 

• f u (i) 1 , vTu , f u (i) 1 

log mm <^ — - \ < log < logmax <^ — — \ . 
[ u'yi) J v 1 w ^ u'(i) J 

Since u, u' e S, it follows that the left-hand side is < and the right- 
hand side > 0, so 

\logv T u — logv T u'\ < p(u, u). (4) 

From (Lange and Holmes, 1981, Theorem 2) we know that there 
exist constants /ci,fc 2 ,r, with < r < 1, such that for any u, v! e U 
and environments e\,. . . ,e m , 

p (X em ■ ■ ■ X ei u, X em --- X ei u') < k ir m p{u, u 1 ) < k 2 r m . (5) 

Of course, the constants may be chosen so that the same relation holds 
for the transposed matrices, with u T , (u') T e V. 

It follows (as in Lemma 2 of Lange and Holmes (1981)) that for 
any u, u' £ U and environments e, e\, . . . , e m , 



lQg \\X e X em ...X ei u\\ bg \\X e X em ■ ..X ei u'\\ 



■■■X ei u\\ \\X em ■ ■ ■ X ei u' 



<k ir m p(u,u') 

(6) 

< k 2 r m . 



The same relation holds when the matrices X are replaced by their 
transposes, with u T , (u') T e V. 
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Since \\X\\ = \\X1\\, and 1/K is in hi, it immediately follows that 
if e[, . . . , e\ are any other environments, 



\X e X em ■ ■ ■ X ei || , _ \\x e x em ■ ■ ■ x et+1 x e , ■ ■ ■ x e 



lo S n v m TT-fi lo g 



\X e _ ■ ■ ■ X ei || ||^e m • • • X ei+1 X e > ■ ■ ■ X e ; I 



< k 2 r ri 



(7) 

We note that the results in Lange and Holmes (1981) depend only 
on the set of matrices X being compact, not on it being finite. In 
section 4 and beyond we will be letting the matrices X e and/or the 
transition matrix P depend smoothly on a parameter x, which will 
take values either in [— xo,xo] or [0,xo]. We may then choose the sets 
hi and V and constants k\,k2,r such that the properties above — in 
particular, the stability of U and V and the bounds (5) and (6) — hold 
simultaneously for all values of the parameter. 



2.2 Time reversal and the stationary distribution 

The transition matrix for the time-reversal of P will be denoted P, 
and is given by 

P (e , e0:= fM P(e ., e) . 

A standard result (for example, see Theorem 6.5.1 of Grimmett and 
Stirzakcr (2001)) tells us that if e±, e 2 , ■ ■ ■ , e m form a stationary Markov 
chain with transition matrix P, for a fixed m, the reversed sequence 
e m , e m _i, . . . , ei is a Markov chain with transition matrix P. 

As described in Lange and Holmes (1981), if e\, e2, . . . forms a sta- 
tionary Markov chain with transition probabilities P, there is a unique 
distribution it on hi x M which is stable under the transformation 
(u,ei) i— > (X ei u/\\ A ei u|| , ej + i). That is, if the normalized population 
structure N t paired with et+i is chosen from the distribution ir, then 
the pair (A^ + i,e t+2 ) will also be in the distribution n. Furthermore, 

(i) For any initial population distribution Nq, and any initial envi- 
ronment e , the random pair (N t ,e t +i) converges in distribution 

tO 7T. 

(ii) For a population distribution u e hi, choose any random se- 
quence of environments eo, ei, . . . . We may define a sequence of 
random vectors Ut := X ei ■ ■ ■ X et uo/\\ X ei ■ ■ ■ X et ua\\. Then Ut 
converges pointwise to a random vector Uoo :— lim^oo U t - If we 
identify uq with Nq , let the sequence of environments be realized 
from the time-reversed chain P, and eo from the stationary dis- 
tribution v, then (U t , eo) has the same distribution as (N t , e t+ i). 
Hence, the distribution of (C/oo, eo) is tt. Furthermore, if uq e U, 
we obtain directly from (5) 

p(U oc ,U t )<k 2 r t . (8) 
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The same holds true, of course, if we reverse the matrix multi- 
plication: Starting from any nonnegative Af-dimcnsional row vector 
Vq, we define from a sequence of environments eo, ei, . . . the sequence 
of row vectors V t T := v%X et ■ ■ ■ X ei /\\v£X et ■ ■ ■ X ei || e V. Then V t 
converges pointwise to a random vector Vac :— lim^oo Vt- When the 
sequence of environments has been chosen from the chain P, we denote 
the distribution of (V^, eo) by jr. As before, tt is the stationary distri- 
bution for the Markov chain on V x M, defined by taking (v T , e*) at 
time t to (v T X e J\\v T X et \\,e t +i) at time t+1, where the environments 
(eo, ei, . . . ) are drawn from the backward chain. 

We also define the regular conditional distributions tt e on U as 
follows: pick (U, e) from the distribution tt, conditioned on e being e, 
and take ir e to be the distribution of U. Similarly we define jr e . In the 
case of i.i.d. environments, of course, ir and n are simply products of 
an independent population vector and environment; the environment 
has distribution v, and the stationary population distributions we also 
denote (by an abuse of notation) by ir and jr. 

2.3 Estimating contraction rates 

The constants r and k\, defined in (5), are crucial to the analysis at 
several stages. We describe here how to obtain plug-in estimates for 
these quantities. This is by no means the most efficient algorithm, nor 
does it obtain the best bounds, but it should be feasible for problems 
of moderate size. To begin, we let Y\, ... , Y5 be the collection of all 
products of the form X ei X e2 ■ ■ ■ X eR ; here S = M R . Following Bushell 
(1973) we define 

A(Y) ■=sup{p(Yx,Yx / ) :x,x'€lnt(RK)}. (9) 

Since for any positive vectors u, v! , u" we have p{u+vl ', u") < max{p(u, u"), p(u' , u")}, 
the maximum of p(u, u') among vectors in a cone is taken between ex- 
treme points of the cone, we can compute A(Y) = sup{p(Y^ , Y"( J '))} 
where yW and Y^ are two columns of the matrix Y. By Theorem 3.2 
of Bushell (1973) it follows that if we let r := maxi<i<5 tanh A(y )/4 
(which is < 1), then for all x, x' £ Int(lR^), and any e\, . . . , en € M, 

p(X ei ■ ■■X eR u,X ei ■ ■ -X eR u') < r p(u,u'). 

Thus, (5) holds with r = r^ R and k\ = r 1_fl . Since we want the 
bounds to hold for the reversed products as well, we repeat these com- 
putations with (X e ) replaced by (Xj), and finally adopt the larger 
values of r and fci . 
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3 Errors and How to Bound Them 



A standard approach to estimating a, and the derivatives that we give 
later on, is to choose a fixed starting vector uq € U, simulate sequences 
eo(i), ei(i), . . . , e m (i) independently from the stationary Markov chain 
with transition probabilities P (i = 1, . . . , J), and then compute 



a m := E 



log 

j 



\ X e m (i)X em _ 1 (i) ■ ■ ■ X ea ^U \\ 



(10) 



\ X e m -i(i) ' ' -^eo(i) u o|| 
J \\ X e m (i) X e m -!(i) ■ ■ ■ *eo(») u 0|| 

~ log ||-X"e m _i(i) ' ' ' ^e « u o|l) • 

It is important not only to know what would be an appropriate ap- 
proximation to a or its derivatives in the sense of being asymptotically 
correct, but also to have rigorous bounds for the error arising from 
any finite simulation procedure, such as (10). There are two sources 
of error: systematic error, arising from the fact that X eo ,u a is not 
exactly a sample from the distribution tt; and sampling error, arising 
from the fact that we have estimated the expectation by averaging over 
a random sample. 



3.1 Systematic error 

By "systematic error" we mean the error in our estimate of a arising 
from the difference between the distribution we are aiming for and 
the distribution we are actually sampling from. The quantity we are 
trying to estimate may be represented as a = tt[F], expectation of 
a certain function F with respect to the distribution tt. If we can 
simulate Z 1; . . . , Zj from it, then aj := J -1 J2j=i 1S an unbiased 

estimator of n[F], and will be consistent under modest assumptions on 
F and the independence of the samples. Suppose, though that what 
we have are not samples from tt, but samples from a "similar" 
distribution tt' . Then we can bound the error by 



a < 



3 = 1 



7T[F}-7T'[F} 



(11) 



Here the first term on the right-hand side is the sampling error, and 
the second term is the bias, the expected value of systematic error. 
The problem is that the bounds we can obtain for the bias are likely 
to be crude, absent good computational tools for the distribution tt 
(and if we could compute analytically from n, we wouldn't need to be 
simulating) . 
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Alternatively, if we can couple the samples Z'j from the approximate 
distribution it' to exact samples Xj from the distribution w, we can 
break up the error in a slightly different way: 



a — a 



< 



< 



< 



J-^F^-^F] + J- 1 Y\f{Z 3 )-F(Z' j ) 



3 = 1 
J 



J 



J- 1 £ F(Zj) - Tf [F] I + J" 1 £ ess sup{ \f(Zj) - F(Z>) I } 
j=i j=i 21 



(12) 

where the essential supremum in the last line is taken over the distri- 
bution of conditioned on Xj. Bounds for the sampling error in (11) 
will generally also be bounds for the first term in (12). The second 
term in (12), on the other hand, which takes the place of the bias, is 
a random variable, computed from the samples Xj. Its expectation is 
still a bound on the bias. The crucial fact is that the last line may be 
computable without knowing in detail what the "correct" sample Zj 
is. 

A small disadvantage of this approach is that the systematic error 
varies with the sample. To achieve a particular fixed error bound 
we need an adaptive approach, whereby we successively extend our 
sequence of matrices until the error crosses the desired threshold. In 
keeping with our comment in section 5, we note here that this approach 
to estimating the systematic error in simulations is essentially just a 
version of the Propp- Wilson algorithm. 



3.2 Sampling error 

The sampling error is difficult to control with current techniques, be- 
cause the distribution of the samples is so poorly understood — the 
very reason why we resort to the Monte Carlo approximation in the 
first place. The best we can do for a rigorous bound is to use Ho- 
cffding's inequality (see Hoeffding (1963)), taking advantage of crude 
bounds on the terms in the expectation. Hocffding's inequality tells us 
that if Xi, . . . ,Xj are i.i.d. random variables such that a < Xi < (3 
almost surely, then for any z > 0, 



{ j^A,-E[X] >z| <2e 



-2Jz 2 /{P-a) 2 



(13) 
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This is essentially the same bound that we would estimate from the 
normal approximation if the standard deviation of X were (/3 — a)/2. 
Of course, the standard deviation will be smaller than this, but we 
do not know how much smaller. An alternative approach then would 
be to use the bound 1t{z\[J /<x), where a is the standard deviation of 
the simulated samples, and r is the cumulative distribution function 
of the Student t distribution with J — 1 degrees of freedom. This will 
be a smaller bound, in that sense "better" , but not precisely true for 
finite samples, to the extent that the sample distribution is not normal. 
Generally we will want to fix p , the confidence level, and compute the 
corresponding z, which will be 



z o = (0-a)^-—log( Po /2). (14) 

The corresponding expression for the estimate based on the t-distribution 
is 

zo = ^=ti_ P0 / 2 (J- 1), (15) 

where t p (J — 1) is the p quantile of the Student T distribution with 
J — 1 degrees of freedom; that is, if T has this distribution then P{T > 
t p (J-l)}=p. 

4 Growth Rate and Sensitivity to Projec- 
tion Matrices 

We present here extensions of two known results. In these cases (and 
in later results) we start by defining an estimator that converges to 
the the quantity we desire, and follow that by bounds on the system- 
atic and sampling errors, as well as an error bound for estimates from 
a simulation estimator. We state our results on error bounds in the 
form "The quantity Q may be approximated by the expectation of A, 
with systematic error bounded by B and sampling error bounded by 
C(J,p)." This means that if A\, . . . ,Aj are independent realizations 
of A, then the probability that the true value of Q is not in the interval 
J~ x ^2 Ai±[B + C(J, p)] is no bigger than p. When describing an adap- 
tive bound on the systematic error, B will depend upon the particular 
simulation result. Again, the sampling error may be bounded either 
by a universally valid Hocffding bound, based on known upper bounds 
on the samples, or by the t distribution using the standard deviation 
estimated from the sample, which provides a generally much superior 
bound, but which can only be treated as an approximation. 
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4.1 Computing a 



The stochastic growth rate a is commonly estimated by numerical sim- 
ulation but, as discussed with examples by Caswell (2001), there is no 
general way to bound the errors in the estimated values. The following 
result provides suitable bounds. 

Theorem 4.1. Letu n be any fixed element of U , andY m := X Bm X em _ 1 ■ 
where eo,ei,... form a Markov chain with transition rates P. The 
stochastic growth rate may be approximated by the simulated expecta- 
tion of 

log "VY" ' (16) 

\\Y m Uo\\ 

with systematic error bounded by k^r m and sampling error at level p 
on J samples bounded by 

lpg sup ueU max egA< \\X e u\\ \ / - logp \ 1/2 ^ 



mf ueU m.m eeM \\X e u\\ J \ 2 J 
When the simulated expectation is 

1 ^ \\X eo (i)Y m (i)u \\ 
J 2^ g \\Y m {i)u \\ 

we may also bound the systematic error by 

J ^ ./ 

T 2~2 sup P( Y m(i)u,Y m (i)u') < - 



7 J 

-]T sup p(y m (*)«,y m (i)« , )<-2A(y ro (*)) ) (is) 

i=i J i=1 



w/iere A is defined as in (9). 



4.2 Derivatives with respect to Projection Matri- 
ces 

We need care in defining derivatives of a with respect to elements 
of the population projection matrices. As discussed in Tuljapurkar 
and Horvitz (2003) we must define how the matrix entries change, 
e.g., do we change fertility rates in a particular environment, or in 
all possible environments? Although the main formula here is known, 
Tuljapurkar's (1990) derivation did not justify a crucial exchange of 
limits (between taking the perturbation to zero and time to infinity). 
We provide a rigorous proof (see Appendix) and of course the error 
bounds here are new. 

We will suppose that the matrices X e depend smoothly on a pa- 
rameter x, so that we may define X e := dX e /dx, and we define the 
base matrices to be at x = 0. In some cases, the parametrization will 
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be defined only for x > 0, and in those cases we will understand the 
partial derivatives to be one-sided derivatives, and the limits lim^^o 
will be the one-sided limits linx^o- 

Theorem 4.2. Let U e and V e be independent random variables with 
distributions TT e and tt s . Then 



, da m 
a := ^ = ^J 6 

eeM 



VlX e U e 



(19) 



Each term may be approximated by averaging samples of the form 

y(m)T j{ Tj(m) 



u * 

e£M 



y(m)T Xe Tj(7n) ' 



(20) 



Xi x ■ ■ ■ Xg m Uq 



where uq, Vq are any fixed elements of U, V respectively, U^ Tl 
and y( m ) T = Vq ' X &m ■ ■ ■ X ei , e ~ eo, ei, . . . ,e m form a sample from 
the Markov chain P, and e = eo, e\, . . . , e m form a sample from the 
Markov chain P. The systematic error may be bounded uniformly by 

2(cxp(4/c 2 r m ) - l)a!, (21) 

while the sampling error at level p on J samples is bounded by 



2 E^ 



. , - e SUp 

eeM ueu^evv X, 
Suppose the simulated expectation is 



v X e u 



,,u 



log(p/2) \ 
2J J 



1/2 



(22) 



where 



Let 



v ve 4- {V {m) (3)) T XU {m) {3) 

ku j ,tt(^ (m) (i)) T Mwor 



U (m) (j) = X Sl{j) ■ ■ ■ X SmU) u =: Y m (j)u ano 
(V {m) {j)) T = vlX em{j) ■ ■ ■ X ei{3) =: vlY m (j). 



U{j) := Y m (j)U = {Y m (j)u :ueU), 
V(j) := VY m (j) = {v T Y m (j) : v T e V}. 
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Then we may also bound the systematic error by 




Vey, ^ v T X e u (V (m) (j)) T X e U (m) ti) 



v T eV(j) 




(V {m) (j)) T X e U( m \j) 
{VW(j)) T X e UWU) 




(23) 



Note that the bound (21) is given as a proportion of the unknown 
a'. It can be turned into an explicit bound by using an upper bound 
on a'. For instance, it is easy to compute that 



5 Environments and Coupling 

Suppose we make a small change in the transition matrix P, and want 
to compare population growth along environmental sequences gener- 
ated by the original and the perturbed matrix. We expect that the 
perturbed environmental sequences will only occasionally deviate from 
the environment that we "would have had" in the original distribu- 
tion of environments. Computing the derivative of a is then a matter 
of measuring the cumulative deviations due to these changes. In this 
section we take an essential first step: fix the transition matrix P and 
compare cumulative change in total population size when starting in 
environment e, as compared with starting in the stationary distribu- 
tion v. Variants of this problem arise in the standard Markov-Chain 
Monte Carlo (MCMC) problem: estimate by simulation the expecta- 
tion of a certain function from the stationary distribution of a Markov 
chain when that distribution is unknown. We need to measure the 
distance between distributions, and hence the difference between ex- 
pectations. A standard method for doing this is coupling. For an out- 
line of coupling techniques in MCMC, see Kendall (2005) and Roberts 
and Rosenthal (2004). We use coupling in two ways, corresponding to 
the two components of the Markov chain: the environment and the 
population vector. 



Lip(a) < sup maxcxp {p (\X e u\ , X e u 



)} 



1 -r 



(24) 
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Fix environments e and e' (possibly the same) . We define sequences 
eo, ei, . . . ; e' , e\, . . . ; and eo, ei, . . . : all three are Markov chains with 
transition probabilities P, but with eo = e, e = e' and eo having 
distribution u (so that (gj) is stationary). Then 

e < e := t hm (E [log ||X et ■ • • X eo ||] - E [log ||X e ; ■ • • X^\ 

C e := lim (E [log \\X et ■ ■ ■ X eo \\] — E [log ||X e - t • • • X~ eo ||]) 

A4 

e' = l 

Note that when the environments are i.i.d. — so P(e, e') = v e > — we 
have 

e < e -log||t/ T X e ||-log||T/' T ^||. 

Computing C, e depends on coupling the version of the Markov chain 
starting at e, to another version starting in the distribution v. We de- 
fine the coupling time r to be the first time such that e r = e T ; after 
this time the chains follow identical trajectories. If we know the distri- 
bution of t and of the sequences followed by the two chains from time 
to t, we can average the diferences in (25) to find The advantage 
of coupling is, first, that it reduces the variability of the estimates, 
and second, that we know from the simulation when the coupling time 
has been achieved, which gives bounds on the error. A suitable choice 
is Griffeath's maximal coupling (Griffcath, 1975) which we will apply 
in Pitman's (Pitman, 1976) path-decomposition representation. (The 
coupling is "maximal" in the sense of making the coupling time, and 
hence the variance of the estimate, as small as possible.) However we 
must be careful about sampling values of r because they may be large 
if the Markov chain mixes very slowly. To deal with this, we use a re- 
sampling technique to overweight coupling times that generate a large 
contribution to £. 

Beginning with a fixed environment e, the procedure is as follows: 

(i) Define the sequence of vectors a t :~ (a t (e') = P t (e 1 e r ) — v{e')). 
We also define af and to be the vectors of pointwise posi- 
tive and negative parts respectively. Let C(t) be any bound on 
|log||X ei ■ ■ ■ X et \\ — logWX,,^ ■■■Xg'Jl |, where the e^ and e\ are 

any environments. From (2) we know that 2k + 2tf is a possible 
choice for C(t). 

(ii) For pairs (i, e'), where t is a positive integer and e' € M, define 
a probability distribution 



q(t,e') r- 



v e if e = e', t = 0; 

ife^e',t = 0; 

^ [a^_ 1 P](e') — af(e') otherwise. 
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This is the distribution of the pair (r, e r ) for the maximally cou- 
pled chain. Define 

oo M 
i„=l e,=l 

and a probability distribution onNxM 

q(t,e')C(t) 



q(t,e') := 



A 



(iii) Average J independent realizations of the following random vari- 
able: Let (t, e') be chosen from the distribution q on N x M. 
Let (eo(r, e'), . . . , e T (r, e')) and (eo(r, e'), . . . , e T (r, e')) be a re- 
alization of the coupled pair of Markov chains with transition 
probabilities P and starting at eo(r, e') = e and e (r, e') with 
distribution v, conditioned on the coupling time being t and 
e r = e T = e' . These realizations are generated from independent 
inhomogeneous Markov chains running backward, with transition 
probabilities 



p {e 4 _i = x | e, = y] 
s {ei-i = x | h = y} 



The random variable is then 



ar_ 1 (x)P(x, y) 

E, M =i«r-i(^')^',2/)' 



z ._ A log \\ X e T (r,e') ' ' ' ^e (T,e') II 
C(t) \\Xg T ( Tte ,y • X So ( Tie t-)\\ 

(Note that the realizations corresponding to r = arc identically 
0. The possibility of r = has been included only to simplify the 
notation. In practice, we are free to condition on r > 0.) 

The change from q to q is an example of importance sampling (cf. 
Chapter V.l in Asmussen and Glynn (2007)). We oversample the val- 
ues of the random variable with high r to reduce the variability of 
the estimate. The importance sampling makes Z(j) a bounded ran- 
dom variable, with bound A. Imagine that we had a source of perfect 
samples V T (j) from the distribution 7r* m , and define 

. = A log W VT ^) X ^(r(j),e'Uy,3) ■ ■■ X e (r(j),e'U);j)\\ 
C( T (.?)) \\ VT (j) X e T (T(j),e'(j);j) ' " " X e (T(j),e'(j);j) II 

Let 

Y l(j) : = X e T (j) ■ ■ - X e (j), 

Y 2 {j) := X e T (j) ■ ■ ■ X e t (j) X e t - 1 (j) ' ' ' X e (j)- 
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Then 

\Z(j) Z(j)\ < (A(y 1 ( J )) + A(F 2 (j))) . (26) 
At the same time E[Z] = ( e , so we may use (14) to compute the bound 



Ce - n- 1 2 Z(j) > 2A\j-± J log(po/2) \ < p . (27) 



Lemma 5.1. The limits defining the coefficients e >( e and ( e exist and 
are finite. We may approximate ( e by 



J_ A 1q \\ X eArU),e'(j);j) ' ' ' X e (T(j),e' (J) ; j) I 

J ~t ^O')) ll X e T (r(i),e'(i)u) ' ' ' X e (T(j),e'(j);j) I 



// po is any positive number, the probability is no more than po that 
the error in this estimation is larger than 



- £ (A^j)) + A(Y 2 (j))) + 2^-i-log(po/2), (29) 



It remains to compute A From standard Markov chain theory 
there is an M x M matrix Q such that 

P = \v T + Q, 

a constant D > and a number £ e (0, 1) such that 

IIQ'II <D?, 

from which it follows easily that for any vector v, 

\\v T P t -u T \\ < D^\\v\\. (30) 

Then 

af = \ T e Q\ 

where l e is the vector with 1 in place e and elsewhere, and 

IH| <D?. 

If we use the bound C(t) = k + tf, then 



A = J2(k + tr)(\\a t - 1 \\-\\a t \ 



t=l 

OO 

fcllaiH+f^llotH (31) 
t=i 
Df 
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6 Derivatives with respect to Environmen- 
tal Transitions 



We are now ready to compute derivatives of a with respect to changes 
in the distribution of environments, as determined by P. Complicating 
the notation is the constraint {P : ^2 , P(e, e') = 1 for each e}; thus, 
there can be no sense in speaking of the derivative with respect to 
changes in P(e, e') for some particular e, e'. Instead, we must compute 
directional derivatives along the direction of some matrix W, in the 
plane J2 e ' W e ,e> = 0. For the purposes of this result we write a — a(P), 
set P e = P + eW, and wish to compute the directional derivative 
Vjya(P). One approach is to estimate 

e-'iaiPJ-aiP)), 

and analyze the limit as e — > 0. The perturbations eW are such that P e 
retains the ergodicity and irreducibility of P. (The result should be the 
same whether e is positive or negative. If P is on the boundary of the 
set of possible values, one or the other sign may be impossible. Some 
choices of W may be impossible in both directions.) In the special 
case in which W e . e > = 1 and W e , e " = —1, with all other entries 0, we 
are computing the derivative corresponding to a small increase in the 
rate of transitioning from environment e to e', and a decrease in the 
frequency of transitioning to e". 

We begin by describing separately the i.i.d. case, when the proba- 
bility v{e) that the environment is e is perturbed to v{e) + ew(e), and 
of course the sum of the w(e) is zero. 

Theorem 6.1. Suppose the environment process is i.i.d. with distri- 
bution v, and we are given w G M. K such that w e — 0. Express a 
as a function of v alone, with the matrices X\ , . . . , Xm assumed fixed. 
Then 

V w a = w e E [\og{V T X e U)\ , (32) 

where U, V T are independent random variables with distributions n and 
tt respectively. This may be approximated by averaging samples of the 
form 

™ e log(V (m)T X e £/ (m) ), (33) 

where 

[/(") =X ei ■■■X em u () and 
VW = vT Xe , m ...X e ,, 

and ei, . . . , e m , e[, . . . , e' m are independent samples from the distribu- 
tion v , and uq Gl4 and e V. 
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The systematic error may be bounded uniformly by 2k*r m \\w\\, while 
the sampling error at level p on J samples is bounded by 

1/2 

2IMI 



sup v T u 



ueu,v T ev 



\ogp 



2J 



(34) 



Suppose the simulated expectation is 



1 



- J2 E ^ E ^g\\V^ T (j)X e U^(j)\\ 



i=l et£M 



where 



U {m) {j) = X eiU) ■ ■ ■ X emU) u =: Y m (j)u and 
V^ T (j) = v%X e , mU) ■ ■ ■ X e[(j) =: v%Y^(j). 
We may also bound the systematic error by 



\w\ 
J 



V( sup p(Y m (j)u,Y m (j)u r ) + sup p(v T Y m (j),v' T Y m (j)) J 
i=i \u,u'eu v T ,v' T eu J 



<^^ (A (r m(j) ) + A(y m ( J ) T )) 



i=i 



(35) 



The preceding result follows as a special case of the more general 
result for Markov environments. 

Theorem 6.2. Suppose the environment process is a Markov chain 
with transition matrix P, with each e; having nonzero probability in the 
stationary distribution v. Suppose we have a smooth curve of stochastic 
matrices P^ x \ with — P, and where the parameter x takes values 
either in a two-sided interval [— eo,eo], or a one-sided interval [0,eo]. 
Let W — dP^/dx, an M x M matrix whose rows all sum to 0. The 
matrices X\ , . . . , Xu are assumed fixed. Then 



e.eEM ^ 



- E 



log 



V?X e X s U s 

HK T ^e|| 



(36) 



where Ug,V^ are independent random variables with distributions ng 
and ir e respectively. 

The quantities (g may be approximated, with error bounds, accord- 
ing to the algorithm described in section 5. 

The other part of the expression may be approximated by averaging 
samples of the form 



e,e£M 



v^ T x s x e u^ 
\\v( m )x e \\ 



(37) 
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where 

U {m) =X- eo ---X- em u 
V^ T = v^X em ..-X eo , 

and e — e"o, e"i, . . . , e m is a Markov chain with transition matrix P, 
and e = eo, ei, . . . , e m is an independent Markov chain with transition 
probabilities P , and u$ £ U and Vq £ V . 

The systematic error may be bounded uniformly by 2k2r m \\(i/\W\)\\, 
while the sampling error at level p on J samples bounded by 

2\\v T \W\\\ sup v T u(^fP) 1/2 . (38) 

Suppose the simulated expectation is 

1 w . V^ T (j)X e X s U^(j) 



where 



y(m,e)T^ 



l-X'eiO - ) * * " - X 'e m 0') 1 *0ll ||^m,e0>0|| 
v X e m (J) ' ' ' X ei(j) _. «0 y m,e(j) 



We may also bound the systematic error by 



V s\We,e\[ Sup p(Y m ,e(j)u,Y mi <;(j)u') + Sup p{v T Y m>e (j)X e , v' T Y m , e (j)X e ) ] 
J j=le,e€M \u,u'eU v T, v 'T eV ) 

1 J 

<jE E Ve\W e! e\(A(Y m>s (j))+A(Y mte (j) T X e 



j—1 e,e£Ai 



(39) 



We note that expressions like (32) and (36) are examples of what 
Bremaud (1992) calls "ersatz derivatives". In a rather different class 
of applications Bremaud suggests applying maximal coupling. 



7 Discussion 

Our results provide analytical formulas and simulation estimators for 
the derivatives of stochastic growth rate with respect to the transition 
probability matrix or the population projection matrices. We have 
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concentrated here on the theoretical results; although this may not 
be obvious, we have made considerable effort at brevity. Partly for 
this reason, we will present elsewhere numerical applications of these 
results. We expect that our results should carry over to integral popu- 
lation models (IPMs), given the strong parallels between the stochas- 
tic ergodicity properties of IPMs and matrix models (Ellner and Rees, 
2007). 

Our results apply not only to stochastic structured populations but 
to any stochastic system in which a Lyapunov exponent of a product of 
random matrices determines stability or other dynamic properties. Ex- 
amples include the net reproductive rate in epidemic models and some 
models of network dynamics. An obvious application of our results is 
to the analysis of optimal life histories, i.e., environment-to-projection 
matrix maps that maximize the stochastic growth rate. As discussed 
by McNamara (1997), this optimization problem translates into what 
is called an average reward problem in stochastic control theory, and 
so our results may be more generally useful in such control problems. 
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Appendix 



Proofs of the theorems. 



A.l Estimating the stochastic growth rate 

We prove here Theorem 4.1. The quantity we are trying to compute is 

a = E[log||X e *7||], (40) 

where (U, e) is selected from the distribution ir. Let eo, ei, e2, ... be a 
realization of the stationary Markov chain with transition matrix P. 
Let Y m :— X Cm X Cm _ 1 ■ ■ ■ X ei . Let u € U be chosen, and let £/ be a 
random variable with distribution 7r eo . Then a = E[log j|X erii+1 Y m J7||/||y m 
which may be approximated by E[log ||X e?ii+i y m uo||/||l^ n 'Uo||]. 
If we identify systematic error with bias, this is 



Error sys = 



E 



log 



\Xe m+1 Y m U \ 



\\Y m u Q \\ 



E 



log- 



lYmU\\ 



Y m U\\ 



since (Y m U/\\Y m U\\,e m ) also has the distribution ir (if Y m and U are 
taken to be independent). Thus 



Error sys < E 



< E 

< fc 2 r 



loe 



\X em+1 Y m U \ 



sup 



, I \x„. 

log- 



log 



\X em+1 Y m U || 



i±l^-log 



y m u 

e m +i 1 m 



\Y m W 



by (6). The corresponding bound on the sampling error may be com- 
puted from (14). 

For a particular choice of of e\, . . . , e m +i and U we can also repre- 
sent the random systematic error as 



log 



\X em+1 Y m u \ 



, \\Xe m+ 

log 



x Y m U\\ 



\\Y m u \\ ~° \\Y m U\ 
which may be bounded by the summand in (18). 



A. 2 Estimating sensitivities: Matrix entries 

We prove here Theorem 4.2. As discussed at the end of section 2.1, 
we may assume that the compact sets U and V are stable and satisfy 
the bounds of section 2.1 simultaneously for all X^\ The station- 
ary distributions corresponding to products of the perturbed matrices 
are denoted tt^ and tt^\ and the corresponding regular conditional 
distributions are iri^ and tt^ 
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The derivative a'(0) may be written as 



lime" 1 lim E 

e— >0 \ m—too 



(<0 y(<0 



log- 



XM 



IX 



(e) 



lim E 

m— >oo 



log- 



•X ei u | 



m— 1 r 

lim lim E e _1 (log- 

s=l L 



'(e) 



( e ) v-0) Y y „ II 

[xljxlj,! ■ ■ ■ xj} +1 X es ■ ■ ■ X eQ Uo\\ 
\\xi2-i ' ' ' Xe s \ 1 X £g ■ ■ ■ X ei Uo\\ 



\\X, 



log 



where eo, ei, . . . is a realization of the stationary Markov chain with 
transition probabilities P. Define a s , m (e) to be the summand on the 
right-hand side above. By (6), 



\a s , m {e)\ < e- 1 ^- 1 sup p(X^u, X e u) < Cr 8 " 1 , 

u&A 



where 



C = 2&2 sup max max 



\X e u 



ueu^M i<e<K (X e u)t 



Since sup e | a s,m( e )| as m ^ oo, we may exchange the 

order of the limits, to see that 



a'(0) =lim lime^V E 



XeJX^J _j • • • .Xe s +i-Xe„ • • • X eo Uo 



(0 



loe 



l-^im-i • • • XeJ +1 X es ■ ■ ■ X ea Uo\ 



E 



I y( £ ) y( £ ) y 1 V 



log 



'(e) 



X eo u \ 



I y( £ ) V' ' Y 



'(e)- 



(41) 



This limit is the same for any choice of Uo, hence would also be the 
same if we replaced uo by a random U, with any distribution on U. 
We choose U to have the distribution 7r eo , independent of the rest of 
the Markov chain. By the invariance property of the distributions n, 



a'(0) =lim lime-M VE 



I y( c ) y"( £ ) \ 1 ' : ■ 



log 



'(e) 



■Xe U\\ 



y(<) Y^Y 



E 



= lim lim e E 

m— >oo ^— ' e— >0 

s=0 



log 



log 



|XiJXeJ_! • • • Xi} +1 X eB ■ ■ ■ X ea U\\ 



\\xi e 2_ 1 ■ ■ ■ X£> +1 X e3 ■ ■ ■ X eo U\\ _ 

\xi2xi2-i ■ ■ ■ xil^Ug-iW 



'(e) 



X^2-i ' ' ' Xel'Us-i 



loe 



I y(f) y(«) y(e) y rf i 



(42) 
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where (U 3 -i,e a ) has distribution ir. 
For m > s > 1 define functions 



fs(e,S) :-- 



i T xi 5 Jxi 5 J_ 



-lTv( s ) v( s ) y^tt 



where the denominator is understood to be 1 for s = m. Take /o(e, S) := 
||xi^£/||. The summand on the right of (42) may be written as 



lime-ME[log/ s (e,e)-log/ s (0,e)]. 



e-s-0 



(43) 



By (6), 



e- 1 |Iog/ a (e,5) - log/ s (0,5)| < k 2 r s e- x p{X^U,X Bs U), 

which is bounded for e in a neighborhood of 0, so the Bounded Con- 
vergence Theorem turns (43) into 



E 



limc- 1 (log/ a (e,e)-log/ 8 (0,e)) 



E 



9 log fa 

de 



(0,0) 



(44) 



since for any choice of eo, • • • , e s and U the function f s is continuously 
diffcrentiable at (0, 0) and bounded away from 0. We have, by linearity 
of the matrix product and || • ||, 



dlog/ s 
de 



(0,0) 



1 T X^ ...x 



e s + l de 



xil'Us-i 



\X em ■ ■ ■ X es U s -i\\ 

_ l T ^e m _i ' ' ' -^e 3 + i jfeX^Us-i 
||^e m _i ' ' ' X Cs U s -\\\ 

i T x em _ 1 ---x ea+1 x es u a . 



\\x er 

l 1 X em U„ 



■x e n 3 



\\x c 



■x e u s 



for 1 < s < 
for s = m. 



Combining this with (42) yields the telescoping sum 

l T ^e t ' ' ' X ei X eQ U 



o'(0) = lim VE 



t=l 



l T X et • • • X ei X eQ U 



m— 1 



1 X et ■ ■ ■ X ei X eo U 



l T X et ---x ei X eo U 



lim E 



1 -^e m ' ' ' X ei X eo U 



l T X em ■ ■ ■ X ei X eo U 
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where in the last line (U, eo) has the distribution ir. Define :— 



1 X„ ■ ■ ■ X K 



\1 2 X em ■••X ei ||. Then converges in distribution 



to V, with distribution 7r eo , and so 



a'(0) 



lim E 



v vlx e u 



VIXM 



r v?x e u e 

V7XM, 



which is identical to (19). 

Now we estimate the error. We use the representation 



jj Xs x ■ ■ ■ Xe m Up y T 

\\Xex ■ ■ ■ Xe m U \\ 



v T x e „ 



■X, 



\V T X e 



■••x K 



where Uq and V T are assumed to have distributions 7rg m and 
respectively. Then 



V( m ) T X P U( m ) 



- log 



V T XM 



V T XJJ 



< 2 (e^ v ' 

< 2 (e ik 



^ T )+2p(U,U <m) ) 

vx e u 



-') 



1 



VX e U' 



vx e u 

VXM 



(45) 



by (6). This implies the uniform bound on systematic error, and the 
bound on sampling error (22) follows from applying (14) to a trivial 
bound on the terms in the average. The simulated bound (23) also 
follows directly from (45). 



A. 3 Estimating sensitivities: Markov environments 

We prove here Theorem 6.2 by a combination of the coupling method 
and importance sampling. We use importance sampling for the actual 
computation, but coupling provides a more direct path to validating 
the crucial exchange of limits. Suppose we are given any e such that 
P + eW and P — eW are both stochastic matrices. 

Given two distributions q and q' on {1, ... , M}, we define a standard 
coupling between q and q'. Suppose we are given a uniform random 
variable w on [0, 1]. Let A4- := {e : q e < q' e } and A4+ := {e : q' e < 
q e }. Let S := J2eeM-(<le - Qe) = Eeex+fe - Q'e)- We define three 
random variables e on M, e+ on A4+, and e_ on A4-, according to 
the following distributions: 

P{e = e} = min{g e ,^}/(l-£), 
P{e+ = e} = (q e - q' e )+/8, 
P{e_ =e} = (q' e - q e )+/5. 
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The joint distribution is irrelevant, but for definiteness we let them be 
independent. Then we define the coupled pair (e, e') by 



(e, e) if us > 5; 
(e+, e_) if oj < 5. 



(46) 



Then e has distribution q, e' has distribution q' , and e = e' with 
probability 1 — <5. This <5 is called the total-variation distance between 
q and q' . 

We write Ep for the expectation with respect to the distribution 
that makes eo, . . . , e m a stationary Markov chain with transition matrix 
P. Define to be the stationary distribution corresponding to p( e \ 
and define P^ to be the time-reversed chain of P^ . We define 



g{m,e;u) := E P 



(e) 



log- 



X* X e 



■■■X eo u\\ 



\X £m _ 1 ■ ■ -X eo u\\ 



log- 



Xe m X e 



■ --XeoUW 



\Xe m ^ 1 ■ ■ ■ X eo u\\ 



By the time-reversal property 



g(m,e;u) = Ep (c) 



log 



\X eo X ei ■ ■■ X Em u\ 
\\X ei ---X em u\\ 



log 



\X ea X ei ■ ■ ■ X erri u\\ 



\x ei ■ ■ ■ X e 



For e > we couple a sequence eo, . . . , e m selected from the distri- 



bution P to a sequence e 



(<0 



,(<0 



selected from the distribution P^ 



as follows: We start by choosing (eo,eQ £ ^) according to the standard 
coupling of (y, v^). Assume now that we have produced sequences 
of length i, ending in ej_i and e-l\. We then produce (e^e-^) ac- 
cording to the standard coupling of row ej_i of P to row ef\ of P( £ >. 
(To simplify the typography in some places, we use e(i) and e^(i) 
interchangeably with e» and e- e \) 

Let (5 = <5(e) be the maximum of the total variation distance be- 
tween v and v^> , and all of the pairs of rows. It is easy to see that 
there is a constant c such that 5 < ce for e sufficiently small. Define 
lo\, 0J2, ■ ■ ■ to be an i.i.d. sequence of uniform random variables on [0, 1], 
and two sequences of random times as follows: T := So ■= — 1, and 

T i+ i = min{t > 5» : w t < £}, 



i+l 



mm 



{t >T l+1 : e{ e) =e t }. 



Thus, — e t for all Si <t < T i+1 . Define for any u eU the random 
vector 

TT .. X e ( t ) ■ ■ ■ X e ( t+m )U 
Ut ■— hm j — 1 — 77, 

m^oo ||A e(t ) • • • A e(t+m )U || 
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and define a version of g conditioned on T\ and T 2 



ff(m,e;u;Ti,T 2 ) := Ep (c 



Then for any u Gli, 



log 



H-X'e 1 "-^e ra «|| 



Ti,T 2 



log 



\X eo X ei ■ ■ ■ X £m u\ 
\\X ei ---X em u\\ 



T U T 2 



V w a(P) = a'(O) = lim lira e _1 E [5(771, e; u; Ti, T 2 )l . (47) 



We also define 



7(e;T 1; T 2 ) :=E 



log 



H^eW(O) ' ' --^e(')(Si-l)^gill 
ll^e(«)(l) ' ' •^e(0(S 1 -l)C / Si|| 

\\ X e(0) X e(l) ■ ■ ■-X'e(Si-l)^Sfill 



- log- 



||^e(l) ' ■•^e(Si-l)^Si| 



Tl,T 2 



We break up these expectations into their portion overlapping three 
different events: 

(i) {m<T 1 }; 

(ii) {T 2 >m>T 1 }; 
(hi) {m>T 2 }. 



On the event {to < Ti} we have g(m, e; u; Ti, T 2 ) = 0, and T\ — to is 
geometrically distributed with parameter <5. By (6), 7 is bounded by 
k 2 r Sl ~ 1 < fc 2 r Tl_1 , meaning that 



E 



|7(e;Ti,T 2 )- 5 (m,e;u;Ti,T 2 )| 



Tl > 771 



E 



|7(e;7i,T 2 )| 



Tl > 771 



< fc 2 E [r^- 1 I Ti > to] 



< 



C 2 m-1 



1 -r 



(48) 



On the event {T 2 > to > Ti}: We have e (e) (i) = e(i) for i < Ti and 
for S\ < i < m. Thus, if Si < to, 

f^Si = ^e(Si) ' ' ' -^e(m)^m+l/||^e(Si) ' ' ' X e ( m) U m+ i\\ 

= X e (.)( Sl ) ■ ■ ■ X e (t)( m )U m+ i/\\X e( t )(Si) ■ ■ ■ X e( , )(m) U m+ i\\ 
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Thus we may write 
\j(e;T 1 ,T 2 ) - g(m,e;u;T 1 ,T 2 )\ 

j \\ X e(0) ■ ■ ■ ^epWl^eM(7\) ' ' ' X e(')(m)U' . ', 



< 



E 



ll-^e(l) ' ' ' X e{T 1 -l) X e^(T 1 ) ' ' ' X e(*)(m)U' 
- E 



Ti,T 2 

• • • -X\,(e 



+ 



E 



log 



< 2/c 2 r r ' 



where 



j ll^e(O) ' ' j^e(7W)^e^)(7\) ' ' ' A e('>(»? „ 
||^e(l) ' ' ■-X'e(Ti-l)-X'eW(Ti) ' ' ' X e^ (m) u \\ 

ll^e(0)^e(l) ' ' ' X e(m)U"'. 
\\ X e(l) ■ ■ ■ X e( m )U"\\ 

_ j \\ X e(0) ■ ■ ■ X e(T 1 -l) X e(T 1 ) ' ' • X e{m)u\\ 
\\ X e(l) ■ ■ ■ X e(T 1 -l) X e(T 1 ) ' ' ' X e(m) u \\ 



Ti,T 2 



Ti,T 2 
(49) 



[/' = 



and C/" 



X e(«)(m+1) ••• X e(')(S 1 -l)' 7s i 
ll X e( = )( m + l)-" X e<«)( Sl -l) C/s l- 

'?7 m +i if Si < m 

, l|A (m+1) ---X (Sl _i ) (7s 1 1| 



if 5*1 < to, 
if S*i > to. 



On the event {T 2 < m}: The above approach shows that 

|7(e;Ti,T 2 )-0(m,e;u;Ti,T 2 )| < 2fc 2 r T2 ~ 1 . (50) 



Combining (48), (49) and (50), we obtain 

| 7 (e;Ti,T 2 )-(/(m,e;«;Ti,T 2 )| 

< k 2 r Tl - 1 l {Tl>m} + 2fc 2 r m l {Tl < m} + 2fc 2 r T2 - 1 . 

Taking the expectation with respect to the distribution of T\ and T 2 , 
using the fact that T\ and T 2 — S\ are independent with distribution 
geometric with parameter S, we obtain 



E 



|7(e;Ti,T 2 )-«/(m,e;u;Ti,T 2 )| 



A- 2fr ( 52 ) 

< J^-r—M + 2fc 2 W m + 2 ,f 2 , 2 <5 2 . 
1 — r r z (l — r) z 



28 



Since S is bounded by a constant times |e|, we may find a constant C 
such that (by the triangle inequality) for all e, positive integers to, and 
ueU, 



E[ 7 (e;Ti,T 2 )] — E [g(m, e; «; Ti, T 2 )] 

<E[|7(e;ri,T2)-fl(m,e;«;ri,T 2 )| 
<C(TOr m |e|+e 2 ). 

This bound allows us to exchange the limits in (47): 
a'(0) = lim lim e _1 E [5(771, e; u; T u T 2 )\ 
= lime^E^K^Ti.Ta)] 
= lim lim e _1 E [<?(m, e; u; Ti, T2)] 



(53) 



(54) 



lim — 



E 



e=0 



p(«) 



log 



\ X e m X em _ 1 ■ ■ ■ X eo u\. 



Now we apply the method of importance sampling. We may assume 
without loss of generality that W(e, e') = whenever P(e, e') = (us- 
ing the analyticity of a, and the fact that the formula (36) is nonsin- 
gular on the nonnegative orthant).For any function Z : M. m+1 — )■ R, 

Ep (0 [Z(e , . . . , e m )] = E P [Z(e , . . . , e m )F(e 

where F is the Radon-Nikodym derivative 



F(e;e , . . . ,e m ) 



dP 



(e ,...,e m ) 



"«o f = o P{ei,e i+ i) 



This allows us to rewrite 
d 



a'(O) = lim , 



E P 

£=0 



r (e) m-1 p( e )/ n 

e fi P( e )( ei ,e l+1 ) 



x log 



(55) 



For any fixed to, there is an upper bound on e (F(e; eo, . . . , e m ) — 1), 
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so we may move the differentiation inside the expectation, to obtain 



a'(0) = lim E P 



(e) m-1 p( e )/ x 



X loe 



|^e m ^e m _l ' ' ' Ae U ll 



• ■•^eo u ll 



lim E P 



_! du, 



(<0 



e=0 



log 



\\X, 



m — 1 



+ lim V E P 



i=0 



i+lj 



P(e l ,e i+ i) 



log- 



e m — 1 



•^eo u ll 



•••^eo U ll 



I IX,. 



• • -X eo u\ 



(56) 

The first limit is 0. To see this, rewrite it as a sum over the possible 
values of eo: 



lim y v^Ep 



seM 



_1 rf^, 



(0 



<ie l e =o 



log- 



• • • XsuW 



\X Em _ 1 ■ --XeUW 



= lim 



dv^ 



m— too * — ' de 

eeM 



-E f 



log 



\\Xe m X en 



■ X ei XeU\\ 



\Xem-! ■ ■ ■ X ei XeU\\ 



Since is a probability distribution, it must be that Y^zLi ~~ir = ®- 
Thus, the expression in the limit becomes if we replace the expecta- 
tion by a constant, independent of S. By Lemma A.l it follows that 
the limit is 0. 

To compute the other limit, we sum over all possible pairs (ej, e»+i) = 
(S, e). The summand becomes 



v(e)W(e,e)E 



e,e£M 



log- 



X„ A. 



■■■^eo«ll 



\X em _,---X eo u\\ 



e, e i+ i = e 



(57) 



In order to analyze this, we need to consider the distribution of e , . . . , e m , 
conditioned on a — S and e i+ i = e. By the Markov property, this splits 
into two independent Markov chains: e = e i+1 ,...,e m is a Markov 
chain of length m — i, with transition probabilities P and starting 
point e, while S = ej, ej_i, . . . , eo is a Markov chain of length i + 1 with 
transition probabilities P and starting point S. Define two indepen- 
dent infinite sequences So, Si, . . . and ej+i, Ci + 2 7 ■ • • , which are Markov 
chains with transitions P and P respectively, beginning in S — S and 
e i+ i = e. Define for i > 1, L/j(S) := X^X^ ■ ■ ■ X^u with Uo(e) := 1, 
and Vf(e) := l 1 ^-! • • • X ei X e with T/ T (e) := 1 T Also define 



K(e) := 



\Ui(e)\ 



V/ (e) := 



l^ T (e) 
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Since \\u\\ = l T u for any nonnegative column vector u, the expression 
(56) becomes 



( m-l , 

o'(0) = E »^ W ^ e U [ ™J E ( E [log^i(e)^m-i(g 

e.egjM ^ i=0 ^ 

Eflogyf^C/^^e)] 
= £ v{e)W{e,e) lim (e [log ^(e)!!] 



m— 1 



z=0 



- E [log ||Vf(e)||] + E[logV£ 1 (e)!7 i (e)] 

-E[log^ T (e)^( e ~)] 
= ]T !/(g) lim ^ W(g,e)E[log||V^(e)||] 



(58) 



eeM 



m—1 



+ V i/(e)W(e,e) lim V E 



e,e£M 



i=0 



log 



^ (e)C/ m _,(e) 



In the last line we have used the fact that X^eeM e ) = 0' which 
means that X) e ex e)E [log || Vo T ( e )ll] = as well, since V$ (e) = 
1 T is independent of e. The same reasoning implies that if we define 
Vj T (^) to be the version of V? started in the stationary distribution 
— for instance, starting from realizations of V^ T (e), define \v) to be 
equal to V^ T (e) with probability v e — then J2 e eM W(e, e)E [log || V^{v)\ 
0. The first term on the right-hand side of (58) may then be written 
as 



V v{e)W{e,e) lim E 



lot! 



\V*(e) 



(59) 

To compute the second term, we note that U(e) := lim^oo Ui(e) 
exists, with distribution irg, and p(Ui(e), U(e)) < kir % ; similarly, V T (e) :- 
lim^oo V^(e) exists, with distribution 7r e , and p(Vi(e), Vi + \{e)) < 
k 2 r\ Thus 



\ogV^ +1 {e')U m ^{e)^ogV^{e')U m ^{e)\ < p(Vi(e'),V i+1 (e')) < k 2 r\ 
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We break up the sum on the right-hand side of (58) into three pieces: 
]T E [\ogV^ +l {e)U m ^{e) - log V? (e)U m -i(e)] 



0<i<m-l 



= J2 E[log^ 1 ( e )(7(p:)-log^ T (e);7(g) 

0<i<m/2 



0<i<m/2 



. V^MUrn-de) V?(e)U m -i(e) 



V/ (e)U(e) 



+ ]T E HV^ +1 (e)U m ^(e) - ]ogV^{e)U m -i{e)] . 

m/2<i<m— 1 



The first sum telescopes to 



E 



lo g y i 7 ; m/2 (e)[/(e)-logt/ r ( e )C/(g) =E logT^^eMe) 



applying the fact that V T = 1 T , so that V r T (e)J7(g) = \\U(e)\\ = 1. 
Applying (4), the second and third sums are bounded by 



2k 2 r m - l + Y, ^r l < 



Thus 



0<i<m/2 



lim V E 

Ti— !-00 ^ — ' 
2-0 



m/2<i<m— 1 



3fc 2 r m / 2 
1 -r 



lot 



V^ 1 (e)£/ rn _ < (e) 



V/ (e)C/ m _ 4 (e) 



E[F T (e)?7(e) 



(60) 



completing the proof of Theorem 6.2. 

Lemma A.l. For any u,u' £ W and e,e' G A4, if we let eo, ei,... 

and e' ,e[,. . . be realisations of the Markov chain P starting at eo = e 
and e = e' respectively. Then 



E 



log 



|A em+1 • • • X eo n|| 



\Xe m ■ ■ ■ X eo u\\ 



-E 



' W X e' m+1 '-- X e'o U '\ 

log- 



• X e / u' 



< 



k 2 D 
l-r 



where £ and D are the constants that satisfy (30). 



(m+l)(£Vr)"\ 
(61) 



Proof. Using the maximal coupling, we create coupled versions of (e, e') , 
such that the coupling time r satisfies 



Define 



X eo u 



\v-F t {e!,-)\\<2D?. 



\X eT _ t ---X eo u\\ 



\x P 



■■■X,,u' 
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Then by the bound (6), 



E 



log 



\ X e m+l ■ ■ -X eo u\ 



-E 



\ W X e' m+1 --- X e'o U '\ 

log- 



\X K 



■X e , u'\\ 



< E 



, \\X em+1 ---X eo u \\X e , ■■■Xju'W 



= E 
< E 
< 



log 

E [k 2 r m - 7 
k 2 D ^ 



X Cm ■ ■ ■ X e(j u\\ 
\X Cm+1 ■ --X^Url 



\x„ 



■X eo u\\ 



log 



X e , u'll 



□ 
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